From Experimental Assessment of Fault-Tolerant Systems to Dependability Benchmarking
نویسنده
چکیده
This short contribution describes first the role of fault injection among the dependability assessment methods that are pertinent approach to the definition and development of dependability benchmarks. Specific problems and challenges faced by dependability benchmarking are then identified and some relevant advances are discussed. 1. Fault Injection-based Experimental Assessment Fault injection has long being recognized as a pragmatic way to assess the behavior of computer systems in presence of faults. Indeed, such a form of controlled experimental approach provides a useful complement to other dependability assessment techniques ranging from measurements to axiomatic methods, especially when used in combination with them [1]. In particular, fault injection can be seen as a means for testing fault tolerance mechanisms with respects to special inputs that are meant to cope with: the faults. Many valuable efforts have been reported that address the use of fault injection to contribute to the validation of fault-tolerant systems, sometimes in cooperation with other techniques such as analytical modeling and formal verification [2]. Besides its contribution for supporting the evaluation of faulttolerant systems, fault injection also proved very much useful for characterizing the behavior of computerized systems and components in presence of faults. Numerous techniques have been proposed [3], ranging from i) simulation-based techniques at various levels of representation of the target system (technological, logical, RTL, PMS, etc.), ii) hardware techniques (e.g., pin-level injection, heavy-ion radiation, EMI, power supply alteration, etc.), and iii) software-implemented techniques that support the bit-flip model in memory elements. Building up on the advances made by these research efforts and on the actual benefits procured, fault injection made his way to industry, where it is actually part of the development process of many providers, integrators or specifiers of dependable computer systems (e.g., Ansaldo Segnalamento Ferroviario, Astrium, Compaq/Tandem, ESA, Ericsson SAAB Space, Honeywell, IBM, Intel, NASA, Siemens, Sun, Volvo, just to name some). This definitely confirms the pertinence of the approach. 2. Requirements and Challenges for Dependability Benchmarking In spite of several pioneering efforts made during the 1990’s (e.g., see [4-8]), and the related initiatives currently being developed — e.g., the IFIP WG 10.4 SIGDeB and the project DBench [9] of the European IST Programme, there is still a significant gap between the level of recognition attached to robustness benchmarks and fault injection-based dependability benchmarking, on one hand and the wide offer, and broad agreement that characterize performance benchmarks, on the other hand (e.g., see [10]). Much effort is clearly needed before the same standing can eventually be achieved. Among the main properties that need to be supported by dependability benchmarks, one can distinguish: agreement among the dependability community, acceptance by the end-users at-large (providers, integrators, stakeholders, etc.), usefulness, by supporting the provision of meaningful measures, fairness, by forming a consistent reference for assessing alternative solutions. Portability (ease of transfer among target systems) and usability (relative simplicity of application and interpretation) are among the other desired properties. Agreement and acceptance are definitely the ultimate properties that need to be aimed at. Of course, usefulness and fairness are paramount in supporting these properties. In practice, basic dimensions such as workload, faultload, measurements and measures precisely characterize a dependability benchmark. Clearly, in the context of dependability assessment, usefulness and fairness properties are very much impacted by the determination of the faultload dimension. In practice, one important question towards this ends is to figure out whether a limited set of injection techniques could be identified that are sufficient to generate the relevant faultload sets according to the desired properties of a benchmark or whether distinct techniques are needed. 1 Special Interest Group on Dependability Benchmarking — See: http://www.dependability.org/wg10.4/SIGDeB. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02) 1530-2075/02 $17.00 © 2002 IEEE 3. Characterization of the Faultload Two main questions have to be considered: • Fault representativeness (of a fault injection technique): To what extent the errors induced are similar to those provoked by real faults or by a representative fault model? • Fault equivalence (of fault injection techniques): To what extent distinct fault injection techniques do lead to similar consequences (errors and failures)? So far, the investigations carried out concerning the comparison of i) some specific fault injection technique with respect to real faults (e.g., see [11-13]) and ii) several injection techniques (e.g., see [14-16]), have shown mixed results. Some were found to be quite equivalent, while others were identified as rather complementary. Accordingly, it seems necessary i) to seek for a more general consensus in particular with the industry sector (including computer and component providers, integrators, and end-users) and ii) to conduct a wider scale controlled research effort on this problem. While the IFIP WG 10.4 SIGDeB provides a suitable forum for making progress with respect to the first item, a proactive research incentive is required to support the second item beyond the research projects currently being developed. In order to set up the background, the presentation briefly surveys first the various dependability assessment methods that are pertinent to the definition and development of dependability benchmarks. Special attention is paid to identify the role of fault injection in the assessment fault-tolerant systems. The focus is then put on the characterization of the faultload attribute with respect to fault representativeness and fault equivalence issues. Emphasis will be put on the related work being developed within the framework of the DBench project [9]. To tackle this problem, the project has defined a comprehensive set of controlled experiments encompassing several levels of application of accidental faults (VHDL simulation, pinlevel fault injection, software-implemented fault injection and software mutation) on various target systems (hardware controllers for embedded applications, off-theshelf operating systems and database applications). Some emerging challenges will also be identified that extend beyond this work. These include, for example, the consideration of the impact of accidental failures on the static infrastructures and ad hoc means used to transfer and process information, not only from the point of view of availability, but also from the point of view of the potential threats caused to the protection mechanisms aimed at ensuring security. Acknowledgement. This work is partially supported by the DBench project (IST 2000-25425). Thanks go to Yves Crouzet, Jean-Charles Fabre, Karama Kanoun, Jean-Claude Laprie and Henrique Madeira for the insightful discussions concerning the issues addressed here.References [1] J. Arlat, A. Costes, Y. Crouzet, J.-C. Laprie and D. Powell, “Fault Injection and Dependability Evaluation of Fault-Tolerant Systems”, IEEE Trans. on Computers, vol. 42, no.8, pp. 913-923, August 1993.[2] D. Powell, “Distributed Fault-Tolerance — Lessons fromDelta-4”, IEEE Micro, vol. 14, no. 1, pp. 36-47, Feb. 1994.[3] J. V. Carreira, D. Costa and J. G. Silva, “Fault InjectionSpot-checks Computer System Dependability”, IEEESpectrum, vol. 36, 50-55, August 1999. [4] T. K. Tsai, R. K. Iyer and D. Jewitt, “An ApproachTowards Benchmarking of Fault-Tolerant CommercialSystems”, in Proc. FTCS-26, Sendai, Japan, 1996, pp. 314-323 (IEEE CS Press).[5] A. Mukherjee and D. P. Siewiorek, “Measuring SoftwareDependability by Robustness Benchmarking”, IEEE Trans.of Software Engineering, vol. 23, no. 6, pp. 366-323,June 1997.[6] P. Koopman and J. DeVale, “Comparing the Robustness of POSIX Operating Systems”, in Proc. FTCS-29, Madison,WI, USA, 1999, pp. 30-37 (IEEE CS Press).[7] A. Brown and D. A. Patterson, “Towards AvailabilityBenchmarks: A Cases Study of Software RAID Systems”,in Proc. 2000 USENIX Annual Technical Conf., San Diego,CA, USA, 2000 (USENIX Association).[8] J. Arlat, J.-C. Fabre, M. Rodríguez and F. Salles,“Dependability of COTS Microkernel-based Systems”, IEEE Trans. on Computers, vol. 51, no. 2, pp. 138-163,February 2002.[9] H. Madeira et al., “Conceptual Framework PreliminaryDependability Benchmark Framework”, DependabilityBenchmarking (DBench) Project, IST 2000-25425,Deliverable CF2 LAAS-CNRS, Toulouse, France, 2001(see also http://www.laas.fr/dbench).[10] J. Gray (Ed.) The Benchmark Handbook, San Francisco, CA, USA: Morgan Kaufmann Publishers, 1993.[11] R. Chillarege and N. S. Bowen, “Understanding LargeSystem Failures — A Fault Injection Experiment”, in Proc.FTCS-19, Chicago, IL, USA, 1989, pp. 356-363 (IEEE CSPress).[12] M. Daran and P. Thévenod-Fosse, “Software ErrorAnalysis: A Real Case Study Involving Real Faults andMutations”, in Proc. ISSTA'96, San Diego, CA, USA, 1996, pp. 158-171 (ACM Press).[13] H. Madeira, D. Costa and M. Vieira, “On the Emulation ofSoftware Faults by Software Fault Injection”, in Proc.DSN-2000, New York, NY, USA, 2000, pp. 417-426 (IEEECS Press).[14] C. R. Yount and D. P. Siewiorek, “A Methodology for theRapid Injection of Transient Hardware Errors”, IEEETrans. on Computers, vol. 45, no. 8, pp. 881-891,August 1996. [15] D. T. Stott, G. Ries, M.-C. Hsueh and R. K. Iyer,“Dependability Analysis of a High-Speed Network UsingSoftware-Implemented Fault Injection and Simulated FaultInjection”, IEEE Trans. on Computers, vol. 47, no. 1,pp. 108-119, January 1998.[16] P. Folkesson, S. Svensson and J. Karlsson, “A Comparisonof Simulation Based and Scan Chain Implemented FaultInjection”, in Proc. FTCS-28, Munich, Germany, 1998, pp. 284-293 (IEEE CS Press). Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS02)1530-2075/02 $17.00 © 2002 IEEE
منابع مشابه
What’s Wrong With Fault Injection As A Benchmarking Tool?
This paper attempts to solidify the technical issues involved in the long-standing debate about the representativeness of fault injection as a tool for measuring the dependability of general-purpose software systems. While direct fault injection seems appropriate for evaluating fault tolerant computers, most current software systems are not designed in a way that makes injection of faults direc...
متن کاملWhat’s Wrong With Fault Injection As A Benchmarking Tool?
This paper attempts to solidify the technical issues involved in the long-standing debate about the representativeness of fault injection as a tool for measuring the dependability of general-purpose software systems. While direct fault injection seems appropriate for evaluating fault tolerant computers, most current software systems are not designed in a way that makes injection of faults direc...
متن کاملWorkshop On Dependability Benchmarking 2002
Assessing the quality of service of a computer system is a difficult task. A lot of work has been conducted on evaluating quality of service attributes like performance, robustness, and dependability. Two approaches used for evaluating performance and robustness are modeling and benchmarking. For evaluating dependability, modeling can be used either alone or combined with fault injection [Sie92...
متن کاملDeriving Dependability Measures of Measurements Recorded in a Matrix
Dependability benchmarking is meant to measure system characteristics like availability, reliablity, data integrity etc. Todays systems are working at high levels of these characteristics. Evaluation of these characteristics demands to inject faults forcing fault tolerant mechanisms to exercise their tasks. Observing the response of the system leads to measurements assessing the quality of thes...
متن کاملAssessing the Reliability of Diverse Fault-Tolerant Systems
Design diversity between redundant channels is a way of improving the dependability of software-based systems, but it does not alleviate the difficulties of dependability assessment. Assuming failure independence between channels is unrealistic. Using statistical evidence from realistic testing, standard inference procedures can estimate system reliability, but they take no advantage of a syste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002